HLA and red blood cell antigen genotyping in SARS-CoV-2 convalescent plasma donors

Aim: More data is required regarding the association between HLA allele and red blood cell (RBC) antigen expression in regard to SARS-CoV-2 infection and COVID-19 susceptibility. Methods: ABO, RhD, 37 other RBC antigens and HLA-A, B, C, DRB1, DQB1 and DPB1 were determined using high throughput platforms in 90 Caucasian convalescent plasma donors. Results: The AB group was significantly increased (1.5×, p = 0.018) and some HLA alleles were found to be significantly overrepresented (HLA-B*44:02, C*05:01, DPB1*04:01, DRB1*04:01 and DRB1*07:01) or underrepresented (A*01:01, B51:01 and DPB1*04:02) in convalescent individuals compared with the local bone marrow registry population. Conclusion: Our study of infection-susceptible but non-hospitalized Caucasian COVID-19 patients contributes to the global understanding of host genetic factors associated with SARS-CoV-2 infection and severity.

tive, while A*03:02 was identified as a risk allele [21]. In silico binding affinity studies have shown that HLA-B*46:01 could increase susceptibility to disease, whereas HLA-B*15:03 is associated with protective immunity. and the same studies found that HLA-A*02:02, B*15:03, and C*12:03 were the most frequently encountered haplotypes associated with the presentation of viral epitopes [18]. Other retrospective studies aimed at identifying potentially protective or risk-increasing alleles [22][23][24][25][26][27][28] have been published. Such an effort requires several geographically diverse laboratories to analyze and share available data in order to generate a comprehensive overview.
In the present study, we are also advancing the hypothesis that associations might exist between certain HLA alleles or blood groups and SARS-CoV-2 susceptibility, as well as the ability to overcome infection without hospitalization. We therefore sought to analyze and determine the existence of any trends in the expression of an extended panel of RBC antigens (ABO, RhD and 37 other antigens), and of HLA-A, B, C, DRB1, DQB1 and DPB1 alleles within a Caucasian convalescent plasma donor cohort, in comparison to different reference frequencies (textbook, local and international databases, and literature). The identification of differential patterns of RBC antigen or HLA expression in convalescent individuals, who were infected but were not hospitalized, could contribute to a better understanding of SARS-CoV-2 susceptibility and COVID-19 severity.

Samples
We analysed a cohort of 90 Caucasian convalescent plasma donors. Donors were randomly chosen from adult participants of the Québec cohort in the CONCOR-1 clinical trial (#NCT04348656). All subjects received an official diagnosis of COVID-19 by the Québec Provincial Health Authority after epidemiologic investigation or after confirmation by polymerase chain reaction (PCR) test. All subjects were COVID-19 symptomatic during infection, cleared the infection without hospitalization, and were free of symptoms for at least two weeks before donating. Since COVID-19 diagnostic was given before spring 2021, subjects were presumably infected with the Alpha (B.1.1.7) or Beta (B.1.351) variant, since no case of the Delta (B.1.617.2) variant had yet been reported in the province of Quebec [29]. All subjects were self-identified Caucasians. Our convalescent plasma donor cohort had an average age of 40.4 ± 15.0 years and consisted of 68% males. Donors were not selected with respect to their ABO group. All donors gave consent to participate in this research project, which was approved by the Héma-Québec Research Ethics Committee. Control populations were from the National Marrow Donor Program (NMDP) registry and from Héma-Québec's bone marrow donor registry. For the Héma-Québec control cohort, 1370 registered individuals typed in high-resolution, from the Montréal and Montérégie regions and of self-reported Caucasian ethnicity, were selected to match with the characteristics of the studied convalescent cohort. No variable other than geographical region and ethnicity was matched, and all donors were eligible to donate plasma in Québec.
Phenotyping & genotyping ABO and RhD phenotype testing was done by serologic detection using the PK7300 from Beckman Coulter, as per the manufacturer's protocol. DNA used for genotyping was extracted from the buffy coat of whole blood samples collected in ethylenediaminetetraacetic acid (EDTA) tubes, using QIAamp Blood Mini kit (Qiagen, Hilden, Germany). RBC genotyping was performed using the Luminex xMAP R technology with the ID CORE XT platform (Progenika Biopharma-Grifols, Bizkaia, Spain), as per the manufacturer's protocol, for the following blood group antigens: Rh (C, c, E, e, C w , hr S , hr B , V, VS), Kell (K, k, Kp a , Kp b , Js a , Js b ), Kidd (Jk a , Jk b ), Duffy (Fy a , Fy b ), MNS (M, N, S, s, U, Mi a ), Diego (Di a , Di b ), Dombrock (Do a , Do b , Hy, Jo a ), Colton (Co a , Co b ), Yt (Yt a , Yt b ) and Lutheran (Lu a , Lu b ). HLA genotyping was done by next-generation sequencing (NGS) on a MiSeqDx (Illumina, CA, USA), using NGSgo R -Ampx v2 kits, and interpreted with NGSengine R v2.21 (both from GenDX, Utrecht, The Netherlands).

Statistical analyses
For RBC antigen and HLA allele comparisons in Tables 1-3, population-wide proportions were assumed to correspond to the Caucasian prevalence estimates (from the Blood Group Antigen FactsBook [30] and from the National Marrow Donor Program database [31]), while the HLA G-group allele frequencies of the 90 participants (180 individual HLA alleles) were estimated along with the Clopper-Pearson 95% confidence interval. Z-tests for two proportions were used to test for statistical significance between populational and observed antigen prevalence. A p value inferior to the Bonferroni correction for multiple comparison per antigen group was considered significant. Allele frequencies were calculated using the GENE[RATE] tool for HLA-A, B, C, DRB1, DQB1, and DPB1 [32]. HLA allelic frequencies were used with a modification of the hierfstat package to calculate the genetic distance (latter) globally and for each locus between the subjects and the reference population [33,34]. Standardized residuals of the cohort subjects were calculated to identify alleles with significant differences between the subjects and the registry. Residuals were calculated by considering the subjects' allele frequencies as the independent variable and the registry frequency as the dependant variable for each locus independently. Frequencies were deemed different at or above a difference of 3 absolute standardized residual. Statistical analyses were performed using R [35].

Red blood cell genotype frequencies
Genotype frequencies for Rh, Kell, MNS, Duffy, Kidd, Diego, Dombrock, Colton, Yt and Lutheran blood groups were determined in each individual, and the resulting predicted phenotypes were compared with the expected Caucasian reference frequencies. The FY*A/*A genotype (Fy[a+b-] predicted phenotype) appears to be overrepresented (nonsignificant, p = 0.030) in our convalescent cohort compared with expected frequencies (0.256 vs 0.170, respectively), as presented in Table 1. Incidentally, FY*A/*B (Fy[a+b+] predicted phenotype) individuals appear to be trending toward a decreased frequency of 0.400 compared with the expected 0.490 (Table 1), although the trend is not significant (p = 0.087). Overall, no antigen group combinations deviated significantly from the expected frequencies.
ABO & RhD phenotyping ABO and RhD phenotypes were determined for each convalescent individual and compared with the expected Caucasian reference frequencies for the entire cohort (n = 90) and within FY*A/A individuals (n = 23), given its trend toward overrepresentation (Table 1). Table 2 presents the ABO phenotyping analysis for the cohort, which allowed for the identification of a significant (p = 0.0178) 2.2× increase for the AB group compared with reference frequencies (0.089 vs 0.040, respectively). While nonsignificant (p = 0.110), the O group trends toward Table 3. Caucasian convalescent donor HLA allele frequency comparison with the NMDP database for HLA*A, B, C, DRB1 and DQB1. The g notation specifies G-groups corresponding to those presented in Gragert  HLA allele frequency comparisons HLA typing was done by NGS in the Caucasian convalescent donors' cohort and individual allele frequencies were determined. The convalescent cohort allele frequencies (2n = 180) were first compared with the NMDP registry frequencies for HLA-A, B, C, DRB1 and DQB1 for the most frequent alleles (Table 3), and no significant differences were identified in the most common alleles in the cohort. The convalescent cohort frequencies for HLA-A, B, C, DRB1, DQB1 and DPB1 were then compared with the Héma-Québec Stem Cell Donor Registry frequencies for Caucasians in the same geographical region (2n = 2740). The genetic distance between the convalescent donors' cohort and the Héma-Québec Registry was calculated for all loci together and on a per-locus basis for HLA-A, B, C, DRB1, DQB1, and DPB1 (Table 4). There was no comparative measurement available for the genetic distance, however the distance is low in all loci compared. Standardized residual analysis was conducted (Table 5 & Figure 1) and led to the identification of alleles that were significantly different between the convalescent cohort (n = 90 individuals) and the Héma-Québec Registry (n = 1370 individuals). HLA-B*44:02, C*05:01, DPB1*04:01, DRB1*04:01 and DRB1*07:01 were significantly overrepresented within our cohort, while A*01:01, B51:01 and DPB1*04:02 were significantly underrepresented. Finally, the alleles identified as significantly different were used to search previously published data for suspected associations with the SARS-CoV-2 virus infection and COVID-19 disease characteristics; these results are presented in Table 5. Of note, for two of the eight alleles that we have identified (DPB1*04:02 and DRB1*07:01), there was no available data in the literature for comparison.

Discussion
Our study took a deeper look into the RBC and HLA characteristics of a COVID-19 recovered and non-hospitalized cohort enrolled in the CONCOR-1 convalescent plasma study. We used ABO and RhD automated blood donor testing, and RBC and HLA high throughput genotyping platforms to determine the existence of potential trends regarding the frequencies of ABO, RhD, and 37 other RBC antigens and HLA genotypes within the cohort, compared with reference Caucasian populations from textbooks, public databases, and our local bone marrow donor registry. A significant AB blood group overrepresentation was identified, as well as a nonsignificant trend in FY*A/A individuals. These results suggest a possible involvement of ABO and Duffy red blood cell antigens in SARS-CoV-2 susceptibility and COVID-19 severity, as all these individuals contracted the virus, yet only had mild symptoms, and cleared the infection without needing hospitalization. The case for HLA association with disease susceptibility and severity is more complex. Overall, the genetic distance calculated from HLA allele frequencies is low (below 0.01) and suggests the cohort is similar to the reference population chosen. When looking at allele-level frequencies, eight HLA alleles were identified by the standardized residuals analysis as potential markers between the convalescent cohort and a stem cell registry from the same geographical region. One of the limitations of this study concerns the lack of a more diverse stratification of disease severity and the limited sample size for COVID-19-affected individuals. Indeed, our study lacks blood group and HLA data from hospitalized, deceased and asymptomatic COVID-19 patients, which would be of interest given that there could be a significant link between ABO and severity [39]. Additionally, while the historical ABO frequencies of Quebec Caucasian blood donors (internal data) matched that of FactsBook, such unbiased information about the frequencies of other RBC antigens is not currently available, hence the use of FactsBook Caucasian reference frequencies. Our study also does not directly address the major RBC and HLA antigen frequency differences between ethnicities. While our convalescent plasma donation program reflected our donor pool [40], the low number of non-Caucasian individuals was insufficient to conduct statistical analysis, which is unfortunate given the importance of understanding the disproportionate impact of COVID-19 on minorities [5,41]. Nonetheless, the identification of a potential overrepresentation of FY*A/*A within Caucasian COVID-19 convalescent individuals and the potential implication of the Duffy blood group could have an impact on future research directions. The trend toward overrepresentation of the Fy(a+b-) predicted phenotype among our COVID-19 convalescent cohort could be explained by the absence of the Fy b antigen, since no significant difference was observed when comparing Fy(a+b+) and Fy(a-b+). In individuals of African descent, the Fy(a-b-) phenotype is caused by a GATA box mutation upstream of the FY gene silencing Fy b expression in RBCs [42]. Given that 67% of African Americans (AA) are Duffy null [43], and that Duffy null patients have an increased mortality rate from acute lung injury [4], some groups have already hypothesized a role for Duffy in COVID-19 AA individuals [43]. We therefore suggest that Duffy allele identification might be used to select individuals at-risk for COVID-19 complications for further research on associations between COVID-19 and RBC antigens. The involvement of ABO blood groups in COVID-19 has previously been described [13,44]. The mechanism underlying the association remains elusive, but could be related to circulating natural anti-A and anti-B antibodies, or a low-efficiency furin cleavage in O-group individuals [9]. A significant overrepresentation of the AB group, and the nonsignificant trend toward underrepresentation of the O group within our cohort, appear to be in agreement with other groups' suggestion that O individuals could be less susceptible to SARS-CoV-2 infection [11,12]. Our sample size does not allow us to draw conclusions as to whether individuals from the AB group are more susceptible to infection and more efficient at clearing infection without hospitalization, or whether this bias is a consequence of the trend toward underrepresentation of O individuals, who are less susceptible to infection. It would be interesting to extend our observations to larger cohort that include severely affected patients.
The involvement of HLA alleles in the disease outcome of COVID-19 patients is getting more evident [45,46]. Various HLA alleles have shown high binding affinity to SARS-CoV-2 peptides [18,47], and trends have been observed in many populations [22,23]. While our sample size is limited, eight alleles were identified as significantly different between the studied cohort and matched individuals from the Héma-Québec Stem Cell Donor Registry, for which we already had high resolution HLA information. Two of these eight HLA alleles have not previously been identified for their association with COVID-19: DPB1*04:02 and DRB1*07:01. The underrepresentation of HLA-A*01:01 is in agreement with its suggested association with high risk and mortality in COVID-19 individuals [36,37], and the overrepresentation of HLA-B*44:02 is compatible with an increased susceptibility to infection [26]. Interestingly, A*01:01, B*44:02 and B*51:01 were predicted as weak binder of SARS-CoV-2 peptides, and none of the other alleles we identified were found to be strong peptide binders [48]. Given that we found A*01:01 and B51:01 to be underrepresented and B*44:02 overrepresented in our cohort, it is difficult to establish a relationship between these data without considering complete haplotypes or other disease severity groups. The other associations are inconclusive, but should be considered in larger cohorts. Overall, more data is required from more diverse populations in order to develop a comprehensive view and a better understanding to manage emerging variants and infection waves.
Altogether, we provide additional information regarding the role of RBC antigens and HLA in SARS-CoV-2 susceptibility, and consequential COVID-19 susceptibility, severity, resolution and long-term clinical consequences. More research needs to be done to get a better understanding of potentially at-risk populations, and for the identification of molecular pathways of this virus.

Conclusion
This study provides insights on the importance of considering the RBC and HLA antigens in regard to the susceptibility, severity, resolution and long-term clinical consequences of COVID-19. Genetic markers such as HLA could help focus our prevention efforts onto potentially more at-risk populations, and help organize vaccination strategies.

Summary points
• ABO, RhD, 37 other RBC antigens and HLA-A, B, C, DRB1, DQB1 and DPB1 were determined using high throughput platforms in 90 Caucasian convalescent plasma donors. • The AB group was significantly increased in convalescent individuals and the FY*A/*A genotype was trending as overrepresented compared with the expected frequencies.
• HLA typing was performed on the convalescent donors and their allele frequencies were compared with the NMDP registry and the Héma-Québec Stem Cell Donor Registry. Standardized residual analysis identified eight alleles that were significantly different between the convalescent cohort and the Héma-Québec Registry.